| Name | Id | |
|---|---|---|
| Shachar Zafran | 319002721 | shaharzafran@campus.technion.ac.il |
| Dorin Shteyman | 206721102 | dorin.sh@campus.technion.ac.il |
In this part you'll implement a small comparative-analysis project, heavily based on the materials from the tutorials and homework.
project/ directory. You can import these files here, as we do for the homeworks.TACO is a growing image dataset of waste in the wild. It contains images of litter taken under diverse environments: woods, roads and beaches.

you can read more about the dataset here: https://github.com/pedropro/TACO
and can explore the data distribution and how to load it from here: https://github.com/pedropro/TACO/blob/master/demo.ipynb
The stable version of the dataset that contain 1500 images and 4787 annotations exist in datasets/TACO-master
You do not need to download the dataset.
(you need to install it..)
you can review good models for the coco-OD task as a referance: SOTA: https://paperswithcode.com/sota/object-detection-on-coco Real-Time: https://paperswithcode.com/sota/real-time-object-detection-on-coco Or you can use older models like YOLO-V3 or Faster-RCNN
Good luck!
TODO: This is where you should write your explanations and implement the code to display the results. See guidelines about what to include in this section.
from IPython.display import Image, display
import matplotlib.pyplot as plt
import numpy as np
import cv2
import glob
import json
2.1. Code structure & how to run it
3.1. TACO class distrbution inspection
4.1. First experiment - Optimizers
4.1.1. Motivation & overview on the tested optimizers
4.1.2. Code modifications for the experiment
4.1.3. Results - numerical and visual
4.1.4. Experiment conclusions - optimizers:
4.2. Second experiment - IoU thresholds
4.2.1. Motivation & overview on IoU
4.2.2. Code modifications for the experiment
4.2.3. Results - numerical and visual
4.2.4. Experiment conclusions - IoU threshold
4.3. Third experiment - Fine tuning (layers freezing)
4.3.1. Motivation & overview on Fine tuning
4.3.2. Code modifications for the experiment
4.3.3. Results - numerical and visual
4.3.4. Experiment conclusions - Fine tuning (layers freezing)
After conducting some research, we've decided to use the latest YOLO model - YOLOv7 as our object detection model.
YOLOv7 is a SOTA (state-of-the-art) object detection model. It is the latest version of the You Only Look Once (YOLO) family of object detection models, which, unlike previous methods, use a single neural network to perform object detection and classification by dividing the input image into a grid of cells and predicting bounding boxes and object classes for each cell. This results in a faster and more efficient detection process.
Compared to previous verisons, YOLOv7 has several significant improvements. YOLOv7 has a simplified architecture compared to YOLOv5 and YOLOv4, reducing the number of layers and parameters in the model, making training faster. YOLOv7 also has new techniques to improve accuracy, by a focus loss function that prioritizes difficult examples during training and additionally, YOLOv7 supports multi-scale training, which improves detection of objects of various sizes and ratios.
Four notable improvements in YOLOv7 compared to older YOLO architectures are:
Based on previous research, when considering the amount of memory it takes to keep layers in memory and the distance it takes a gradient to back-propagate through the layers, it was concluded that reducing the gradient distancemakes the network's learning more powerful. Thus, in YOLOv7 the chosen layer aggregation form is E-ELAN, an extend version of the ELAN computational block.
Re-parameterization techniques means averaging a set of model weights to make a model more robust to general patterns. Recently, a module level re-parameterization thecnique was presented, where pieces of the network use their own re-parameterization strategies. In YOLOv7, using gradient flow propagation paths, it is determined which modules of the network should use re-parameterization strategies and which shouldn't.
The YOLO head outputs the final predictions, but it is far downstream. The added auxiliary head is in the middle and is also supervised during training as well as the regular head at the end.
The auxiliary head training process is less eficiently because the network is smaller between it an the prediction. YOLOv7 coarse-to-fine auxilary head supervision was decided. The coarse-to-fine supervision auxiliary head operates in two stages. In the first stage, the input image is divided into smaller grids, and object detection is done for each grid by itself. In the second stage, the outputs from each grid are aggregated to produce the final detection.
It also includes a feedback mechanism - where the intermediate detection results are used to improve the detection performance of the next stages. This feedback mechanism helps to refine the detections, especially for smaller objects.
Previous versions of YOLO used a standard cross-entropy loss function, which is known to be less effective at detecting small objects. Focal loss overcomes this issue by down-weighting the loss for well-classified examples and focusing on the problematic examples and the objects that are hard to detect. The focal loss formula:
FL(p_t) = -α_t(1 - p_t) ^γ * log(p_t)
where:
p_t is the predicted probability for the true class, α_t is the weighting factor for the true class (α_t = 1 for most cases), γ is the focusing parameter (typically γ > 0). When γ = 0, the focal loss becomes the cross-entropy loss.
Our baseline model was created by:
cloning YOLOv7's github repostory [1] (where the number of classes is 7), pretrained on the varsetile and large COCO dataset.
Installing "requirements.txt" file (as shown in /project/SetupEnv.ipynb)
download the YOLOv7 starting weights from pretraining on the COCO datset: "yolov7_training.pt" (as shown in /project/SetupEnv.ipynb)
Downloading the TACO dataset for trash detection of 7 classes [2], both train and validation (as shown in /project/MiniProject.ipynb)
Run in the Lambda server the following line:
sbatch -c 2 --gres=gpu:1 -o train.out -J train train.sh
Baseline model configurations:
To train the baseline model, we set:
As for the rest of the necessary arguments, we use their predefined default arguments.
Note: we changed the epochs to be 30 instead of 300 because we fine-tune over an already pretrained YOLOv7 over the COCO dataset, to make the model task-specific to trash detection.
Note 2: the batch size was changed from 16 to 8 due to computation and space limits of the Lambda server.
SGD Optimizer, NO freezed layers, IoU training threshold of 0.2,
The TACO trash-detection dataset is a subset of the TACO dataset that focuses specifically on detecting trash objects in images. It contains over 6,000 images, each annotated with bounding boxes for 60 different types of trash objects, such as plastic bottles, cans, wrappers, and more.
For the sake of this project, we used the dataset when divided into 7 classes:
This dataset is used for training and evaluating object detection models for trash detection in real-world scenes. The TACO trash-detection dataset provides a benchmark for evaluating the performance of trash detection models.
Let us first aspect the distribution of the image annotations in the train set of the TACO dataset over its 7 class categories. Ideally, we'd want the annotations to be divided equally between each category, but in reality we see it's not the case as the following code demonstrates.
def count_string_occurrences(data, target_str):
count = 0
if isinstance(data, str):
count += data.count(target_str)
return count
for key, value in data.items():
if isinstance(value, dict):
count += count_string_occurrences(json.dumps(value), target_str)
elif isinstance(value, list):
for item in value:
count += count_string_occurrences(json.dumps(item), target_str)
return count
categories_dict = {1:"metals_and_plastic", 2:"other", 3:"non_recyclable",
4:"glass", 5:"paper", 6:"bio", 7:"unknown"}
target_str_list = [f'"category_id": {x}' for x in list(categories_dict.keys())]
file_path = 'imgs/taco-dataset/annotations/annotations_train.json'
with open(file_path, 'r') as f:
data = json.load(f)
count_arr = []
for i, target_str in enumerate(target_str_list):
# Call the 'count_string_occurrences' function
count_arr.append(count_string_occurrences(data, target_str))
# print(f'There are {count_arr[-1]} annotations of "{categories_dict[i+1]}" in the TACO training dataset')
# Define colors for each bar
colors = ['blue', 'green', 'red', 'purple', 'orange', 'brown', 'pink']
# Create a bar plot
plt.bar(range(len(count_arr)), count_arr, tick_label=list(categories_dict.values()), color=colors)
plt.xticks(fontsize=10)
plt.xticks(rotation=315)
# Add bar values on top of bars
for i, val in enumerate(count_arr):
plt.text(i, val+0.1, str(val), fontsize=10, ha='center')
plt.xlabel('Classes')
plt.ylabel('Annotations count')
plt.title('TACO dataset class distribution')
plt.show()
When running a training process, the YOLOv7 model plots some key results at the end of the training to help evaluating the model. All training experiments are saved in a folder called "runs" under the project's root folder. These plots are very informative, containing visual results of the test images next to the ground-truth boxes and labels as well as numeric evaluation over some metrics and confusion matrix comparing the predicted and true classes.
In the following section of the notebook, we evaluate model's performance according to the following evaluation methods:
The performance of the produced models of each experiment will be evaluated using the follwoing metrics:
Recall - percentage of relevant objects that were correctly identified by the model out of all the relevant objects that should have been detected. High recall - the model is able to correctly identify most of the relevant objects. Low recall - the model is missing many of the relevant objects.
Recall = True Positives / (True Positives + False Negatives)
Percision - Precision is the ratio of true positive detections to the total number of positive detections (true positives plus false positives).
Percision = True Positives / (True Positives + False Positives)
mAP (mean Average Precision) - calculated by measuring the average precision for each class of objects in the dataset and taking the mean of those values. It takes into account both precision and recall, and it is calculated by computing the area under the precision-recall curve. Higher mAP score indicates better performance for the model.
mAP@0.5 - mean average precision of the model's predictions when using an IoU threshold of 0.5 to determine if a prediction is a true positive or a false positive.
mAP@0.5:0.95 - mean average precision of the model's predictions when using a range of intersection over union (IoU) thresholds from 0.5 to 0.95, in steps of 0.05, to determine if a prediction is a true positive or a false positive.
NOTE - the IoU threshold for training is not the same as the IoU threshold for evaluation! the IoU threshold set for training effects the loss component and thus the weights' update process and their final values.
Definition - table that summarizes the performance of the model in classifying objects into different categories. The rows represent the predicted classes of objects and the columns represent the true classes of the objects. Each [i,j] cell of the confusion matrix represents chance of item of the true class j to be predicted as an item of class i.
class of no class - the "Background FN" class column represents the scenario where the model falsely detected a bounding box as trash when there is no trash object inside the bounding box. the "Background FN" class row represents the scenario where there is a trash object that the model didn't detect. For example, if a cell in the "Background FN" class row is 1.0, it means that the class of this column is never detected by the model.
As we've learned in the course, optimizers are algorithms used to adjust the parameters of a neural network during training. They find the set of parameters that minimize the cost function. Diferent optimizers update the parameters differently based on the gradients of the cost function.
We've decided to test different optimizers for YOLOv7's training process. Here's a brief overview on each optimizer:
Adam (Adaptive Moment Estimation): adaptively adjusts the learning rate for each parameter based on past gradients.Adam is a combination of two gradient descent methods, Momentum, and RMSP. Adam is known for its fast convergence and robustness to noisy gradients.
AdamW: a variation of Adam that incorporates weight decay into the update rule to prevent overfitting. Weight decay basically adds a penalty term to the loss function proportional to the L2 norm of the model's parameters.
SGD (Stochastic Gradient Descent): basic optimization algorithm that updates the model parameters based on the negative gradient of the loss function with respect to each parameter. The gradient is estimated using a randomly sampled subset of the training data (mini-batch). SGD is simple and easy to implement, but it can suffer from slow convergence and sensitivity to the learning rate.
Lion: new optimization algorithm that uses a two-phase search process. The first phase is a global search that explores the entire parameter space, and the second phase is a local search that fine-tunes the best solutions found in the first phase. Lion was meant to be robust to noisy gradients, it is relatively new and has not been widely adopted yet.
Reading about the new LION optimizer by Google encouraged us to test this aspect of the YOLOv7 model and to see how it responds particularly to this optimizer.
illustration of an optimization process
In order to activate diferent optimizers, in "train.py" we had to change the "optimizer" parameter assignment. SGD is the default optimizer used for the baseline model. For adam, YOLOv7 already has a configuration for this optimizer. We only added "--adam" to the command line inside "train.sh" (see /project/train.sh) like so:
For Lion, we followed the instructions mentioned in its github repostory [3] and configured the optimizer parameter of the training process like so:
where pg0 are some of the optimizer's parameter group.
For Adamw, we used the same default parameters defined in YOLOv7 for Adam (a common configuration for adamw) and assigned the optimizer parameter of the training process like so:
where pg0 are some of the optimizer's parameter group.
import matplotlib.pyplot as plt
import numpy as np
results_txt_paths = ['results/training results/base-SGD/results.txt',
'results/training results/adam/results.txt',
'results/training results/adamW/results.txt',
'results/training results/Lion/results.txt']
# fig, ax = plt.subplots(2, 2, figsize=(12, 6), tight_layout=True)
fig, ax = plt.subplots(4, 1, figsize=(15, 20), tight_layout=True)
ax = ax.ravel()
s = ['Precision', 'Recall','mAP@0.5', 'mAP@0.5:0.95']
labels = ['SGD (baseline)', 'Adam', 'AdamW', 'Lion']
for i in range(4):
for fi, f in enumerate(results_txt_paths):
results = np.loadtxt(f, usecols=[8, 9, 10, 11], ndmin=2).T
n = results.shape[1] # number of rows
x = range(0, n)
y = results[i, x]
if i in [0, 1, 2, 5, 6, 7]:
y[y == 0] = np.nan # don't show zero loss values
label = labels[fi]
ax[i].plot(x, y, marker='.', label=label, linewidth=2, markersize=8)
ax[i].set_title(s[i], fontsize=18)
ax[i].legend()
listOfImageNames = ['results/training results/base-SGD/confusion_matrix.png',
'results/training results/adam/confusion_matrix.png',
'results/training results/adamW/confusion_matrix.png',
'results/training results/Lion/confusion_matrix.png']
titles = ['SGD (baseline)', 'Adam', 'AdamW', 'Lion']
_, axs = plt.subplots(len(titles), 1, figsize=(40, 40))
axs = axs.ravel()
for img, ax, title in zip(listOfImageNames, axs, titles):
img = cv2.imread(img)
ax.imshow(img)
ax.axis("off")
ax.set_title(title, fontsize=18)
plt.show()
listOfImageNames = ['results/training results/base-SGD/test_batch0_labels.jpg',
'results/training results/base-SGD/test_batch0_pred.jpg',
'results/training results/adam/test_batch0_labels.jpg',
'results/training results/adam/test_batch0_pred.jpg',
'results/training results/adamW/test_batch0_labels.jpg',
'results/training results/adamW/test_batch0_pred.jpg',
'results/training results/LION/test_batch0_labels.jpg',
'results/training results/LION/test_batch0_pred.jpg']
titles = ['SGD (baseline) ground-truth','SGD (baseline) prediction' ,
'Adam ground-truth','Adam prediction',
'AdamW ground-truth', 'AdamW prediction',
'Lion ground-truth', 'Lion prediction']
_, axs = plt.subplots(len(titles), 1, figsize=(40,40), tight_layout=True)
axs = axs.ravel()
for img, ax, title in zip(listOfImageNames, axs, titles):
img = cv2.imread(img)
ax.imshow(img)
ax.axis("off")
ax.set_title(title, fontsize=16)
plt.show()
SGD - Taking all 4 plots into account, the baseline SGD model achieves the best performance over the task. This result isn't surprising since the YOLOv7 model hyperparameters were configured under the use of the SGD optimizer.
Adam - In constrast to SGD, the Adam optimizer performed the worst while we expected it to have similar rates as the SGD because it was also implemented in the YOLOv7 as an optimizer of choice (but not the default optimizer choice).
Lion - The lion model is the closest in perfomance to the SGD model (as seen clearly in the mAP plots) and even has a better recall value compared to SGD during most of the training epochs. Considering the fact that using the defualt lion setting of hyperparameters without twiking and no hyperparameter evolution of the YOLOv7 using lion, these results are quite promising and defienetly encourage further investigation.
AdamW - Since AdamW is an upgrade of the Adam optimizer, as expected, the adamW optimizer achieved close but slightly better results compared to Adam.
As expected from the the TACO dataset inspection in section 3.1., all models aren't able to detect the classes with a small amount of data: "other", "glass", "bio". In addition, the "unknown" class is probably very verstile and contains no learnable patterns".
Overall, the SGD model is the only model who's able to detect correctly the "paper" and "non-recyclabe" classes but also in low rates of 0.19 and 0.15 respectively.
for the "metlas and platsic" class, SGD also performaes best and detects this class correctly in 40% of the cases. The Lion model achieves close results with 35% for the "metlas and platsic" class. AdamW and especially Adam perform poorly on all classes.
As expected, the visual results coincide with the confusion matrix results, showing behaviors we observed such as:
The "unknown" category cannot be detected by any model
Adam model barely produces detections since all classes cannot be predicted (classified as "backgroun FN")
Metals and plastic is the predominant category for all other models, and SGD can also produce some exmaples as "non-recyclable"
In terms of detecting the bounding boxes, Lion and SGD show superior abilities (as expected).
As we've learned in the course and implemented in HW2, Interscetion of Union (IoU) is an important measure of accuracy for deep learning models who try to predict bounding boxes.
In the follwoing experiment, we use the baseline model's parameters (described in previous section) an twik only the IoU threshold parameter to see how it effects the model's performance and accuracy.
The Intersection over Union (IoU) threshold is a parameter used in object detection tasks to measure the accuracy of the predicted bounding boxes around objects. It is calculated by taking the intersection area of the predicted and ground truth bounding boxes and dividing it by the union area of the two boxes. The IoU score is between 0 and 1, where an IoU score of 1 indicates a perfect match between the predicted and ground truth bounding boxes. The IoU threshold is a threshold value determines whether a predicted bounding box is considered a true positive detection or a false positive. Meaning, if the IoU score between the predicted and ground truth bounding boxes is above the IoU threshold, the prediction is considered a true positive and otherwise, it is a false positive.
If we set the IoU threshold too high we may miss some true positive detections because small deviations between predicted and ground truth bounding boxes can cause the IoU score to be below the IoU threshold. Setting the IoU threshold too low may result in accepting false positive detections. Therefore, finding an optimal IoU threshold is important to balance the trade-off bet ween maximizing accuracy (and maintaining the accuracy measure valid in a sense) and minimizing false positives.
For this reason, we've decided to play around with the effect of the IoU and see first hand its effect on the YOLOv7 performance.
The only code modification that was necessary for this experiment was to set the IoU threshold variable (iou_t) inside the yaml file of the hyperparameters that the baseline model loads (hyp.scratch.p5.yaml) to the IoU threshold values we experimented with =>
iou_t values = [0.2 (baseline), 0.25, 0.35, 0.45, 0.55]
results_txt_paths = ['results/training results/base-SGD/results.txt',
'results/training results/iou_25/results.txt',
'results/training results/iou_35/results.txt',
'results/training results/iou_45/results.txt',
'results/training results/iou_55/results.txt']
fig, ax = plt.subplots(4, 1, figsize=(15, 20), tight_layout=True)
ax = ax.ravel()
s = ['Precision', 'Recall','mAP@0.5', 'mAP@0.5:0.95']
labels = ['iou=0.2 (baseline)', 'iou=0.25', 'iou=0.35', 'iou=0.45', 'iou=0.55']
for i in range(4):
for fi, f in enumerate(results_txt_paths):
results = np.loadtxt(f, usecols=[8, 9, 10, 11], ndmin=2).T
n = results.shape[1] # number of rows
x = range(0, n)
y = results[i, x]
if i in [0, 1, 2, 5, 6, 7]:
y[y == 0] = np.nan # don't show zero loss values
label = labels[fi]
ax[i].plot(x, y, marker='.', label=label, linewidth=2, markersize=8)
ax[i].set_title(s[i], fontsize=18)
ax[i].legend()
listOfImageNames = ['results/training results/base-SGD/confusion_matrix.png',
'results/training results/iou_25/confusion_matrix.png',
'results/training results/iou_35/confusion_matrix.png',
'results/training results/iou_45/confusion_matrix.png',
'results/training results/iou_55/confusion_matrix.png']
titles = ['iou=0.2 (baseline)', 'iou=0.25', 'iou=0.35', 'iou=0.45', 'iou=0.55']
_, axs = plt.subplots(len(titles), 1, figsize=(40, 40))
axs = axs.ravel()
for img, ax, title in zip(listOfImageNames, axs, titles):
img = cv2.imread(img)
ax.imshow(img)
ax.axis("off")
ax.set_title(title, fontsize=17)
plt.show()
listOfImageNames = ['results/training results/base-SGD/test_batch0_labels.jpg',
'results/training results/base-SGD/test_batch0_pred.jpg',
'results/training results/iou_25/test_batch0_labels.jpg',
'results/training results/iou_25/test_batch0_pred.jpg',
'results/training results/iou_35/test_batch0_labels.jpg',
'results/training results/iou_35/test_batch0_pred.jpg',
'results/training results/iou_45/test_batch0_labels.jpg',
'results/training results/iou_45/test_batch0_pred.jpg',
'results/training results/iou_55/test_batch0_labels.jpg',
'results/training results/iou_55/test_batch0_pred.jpg']
titles = ['iou=0.2 ground-truth', 'iou=0.2 prediction',
'iou=0.25 ground-truth','iou=0.25 prediction',
'iou=0.35 ground-truth', 'iou=0.35 prediction',
'iou=0.45 ground-truth', 'iou=0.45 prediction',
'iou=0.55 ground-truth', 'iou=0.55 prediction']
_, axs = plt.subplots(len(titles), 1, figsize=(40,40), tight_layout=True)
axs = axs.ravel()
for img, ax, title in zip(listOfImageNames, axs, titles):
img = cv2.imread(img)
ax.imshow(img)
ax.axis("off")
ax.set_title(title, fontsize=16)
plt.show()
Here, we see that surprisingly, the only model who substantially demonstrates worse performance is the iou=0.25. However, higher IoU values generate results similar to the baseline IoU value = 0.2.
Explanation - during the first 2 epochs all models behave the same, and from that point, probably, the gradient direction of the IoU=0.25 diverges to converge on a sub-optimal solution in relation to the other IoU models.
Correspondingly with the conclusions derived from the "Evaluation metrics - analysis" above, IoU=0.25 performs worse by a large margin on the "metals and plastic", "non-recyclabe" and "paper" classes.
As already described above, other classes suffer from lack of data and the "unknown" category (probably) doesn't carry learnable patterns, so we don't expect the models to classify them well.
IoU=0.45 achieves the best accuracy and IoU=0.35 is only slighly less. Both IoU=0.45 and IoU=0.35 are better than the baseline IoU=0.2 model, especially on the paper class - but not by much.
In general, the models (besides IoU=0.25) are very confident about their predictions, meaning that if they alreay detected the object, with high probability its detection is correct. This result is supported by the fact that the majority of "metals and plastic", "non-recyclabe" and "paper" classes predictions are "background FN", meaning the model didn't detect the object and the second highest value belong to the correct class.
For the reasons explained in the analysis above, all models beside IoU=0.25 produce similar detections.
IoU=0.25 produces less bounding boxes, supported by its confusion matrix where most class predictions are "Background FN" - meaning the model wasn't able to detect them.
Fine-tuning is a transfer learning technique, where a pre-trained model is further trained on a new dataset or task. This is done by updating the pre-trained model's weights using a smaller dataset task-specific dataset (like the TACO datset for trash detection in our case) to learn specific features relevant to the new task. Fine-tuning can improve the performance of the model and reduce the training time required compared to training from scratch.
Layer freezing is a technique used during fine-tuning where certain layers in the pre-trained model are kept fixed and not updated during training. This is done to preserve the learned features in those layers and prevent overfitting on the new dataset. Typically, the lower layers are frozen, as they capture low-level features that are relevant across different tasks, while higher layers are fine-tuned to learn task-specific features.
After examining the yaml file of the YOOLOv7 architecture (yolov7.yaml), we decided to freeze the backbone layers (which are the first 51 layers) for the first experiment and the first 3 layers for the second experiment. Freezing more deep layers would result in underfitting to the task-specific (trash detection) problem. To do so, we changed the '--freeze' variable of the train argument parser inside the "__ main __" function of "train.py".
parser.add_argument('--freeze', nargs='+', type=int, default=[0])
parser.add_argument('--freeze', nargs='+', type=int, default=np.arange(51))
parser.add_argument('--freeze', nargs='+', type=int, default=[0])
parser.add_argument('--freeze', nargs='+', type=int, default=np.arange(3))
import matplotlib.pyplot as plt
import numpy as np
results_txt_paths = ['results/training results/base-SGD/results.txt',
'results/training results/freeze_3layers/results.txt',
'results/training results/freeze_backbone/results.txt']
fig, ax = plt.subplots(2, 2, figsize=(12, 6), tight_layout=True)
ax = ax.ravel()
s = ['Precision', 'Recall','mAP@0.5', 'mAP@0.5:0.95']
labels = ['no freeze (baseline)', 'freeze 3 first layers', 'freeze backbone']
for i in range(4):
for fi, f in enumerate(results_txt_paths):
results = np.loadtxt(f, usecols=[8, 9, 10, 11], ndmin=2).T
n = results.shape[1] # number of rows
x = range(0, n)
y = results[i, x]
if i in [0, 1, 2, 5, 6, 7]:
y[y == 0] = np.nan # don't show zero loss values
label = labels[fi]
ax[i].plot(x, y, marker='.', label=label, linewidth=2, markersize=8)
ax[i].set_title(s[i])
ax[i].legend()
listOfImageNames = ['results/training results/base-SGD/confusion_matrix.png',
'results/training results/freeze_3layers/confusion_matrix.png',
'results/training results/freeze_backbone/confusion_matrix.png']
titles = ['no freeze (baseline)', 'freeze 3 first layers', 'freeze backbone']
_, axs = plt.subplots(len(titles), 1, figsize=(40, 40))
axs = axs.ravel()
for img, ax, title in zip(listOfImageNames, axs, titles):
img = cv2.imread(img)
ax.imshow(img)
ax.axis("off")
ax.set_title(title, fontsize=18)
plt.show()
listOfImageNames = ['results/training results/base-SGD/test_batch0_labels.jpg',
'results/training results/base-SGD/test_batch0_pred.jpg',
'results/training results/freeze_3layers/test_batch0_labels.jpg',
'results/training results/freeze_3layers/test_batch0_pred.jpg',
'results/training results/freeze_backbone/test_batch0_labels.jpg',
'results/training results/freeze_backbone/test_batch0_pred.jpg']
titles = ['no freeze (baseline) ground-truth','no freeze (baseline) prediction' ,
'freeze 3 first layers ground-truth','freeze 3 first layers prediction',
'freeze backbone ground-truth', 'freeze backbone prediction']
_, axs = plt.subplots(len(titles), 1, figsize=(40,40), tight_layout=True)
axs = axs.ravel()
for img, ax, title in zip(listOfImageNames, axs, titles):
img = cv2.imread(img)
ax.imshow(img)
ax.axis("off")
ax.set_title(title, fontsize=16)
plt.show()